Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.x: Make TabletMap return empty replica set when stale replica is found #385

Conversation

Bouncheck
Copy link
Collaborator

Makes TabletMap return empty collection for some calls to getReplicas(...). If the current replica list for a tablet turns out to contain a host that is not present in current Metadata then empty collection is returned in an effort to misroute the query. This query will cause the tablet information to be updated when the result with the up to date information is received.
It is possible that the query will be randomly routed correctly, but this case is significantly less likely with each subsequent query, although this can depend on underlying load balancing policy.

This covers the node removal/replacement case of #378.

@Bouncheck Bouncheck requested a review from dkropachev November 26, 2024 20:32
@Bouncheck Bouncheck self-assigned this Nov 26, 2024
@Bouncheck Bouncheck force-pushed the scylla-3.x-remove-tablets-w-stale-hosts branch from 4caf8af to 7ee8323 Compare November 26, 2024 20:33
Copy link

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, why does this method return a set instead of some vector / list (in general: some ordered collection)? Order of the replicas matters for LWT.

Another note: this method is executed for each query, and requires allocations to be made to transform a list of hosts to lists of uuids. Why not do this transformation once, when receiving a tablet?

@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Nov 26, 2024

I wonder, why does this method return a set instead of some vector / list (in general: some ordered collection)? Order of the replicas matters for LWT.

Another note: this method is executed for each query, and requires allocations to be made to transform a list of hosts to lists of uuids. Why not do this transformation once, when receiving a tablet?

Good points. I'll try to address both of them. When first implementing I likely did not think about LWT at all. Thanks for quick review.

@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Nov 26, 2024

However all past getReplicas implementations in Metadata class were returning Sets, so maybe the ordering is preserved (does not seem to be the case). That is assuming that LWT worked generally correctly with TokenMap. Dmitry however showed me evidence that it may not be the case and previous LWT implementation could be bugged.

I think I was just matching the convention when creating the version for the tablets.

@dkropachev
Copy link
Collaborator

dkropachev commented Nov 27, 2024

@Bouncheck, feel free create a separate PR/issue on this problem or fix it in this PR. To me it would be better to address it in PR for #383, or in completely separate one.

@Bouncheck Bouncheck force-pushed the scylla-3.x-remove-tablets-w-stale-hosts branch from f7ea063 to 493b19d Compare November 27, 2024 16:58
@Bouncheck
Copy link
Collaborator Author

Another note: this method is executed for each query, and requires allocations to be made to transform a list of hosts to lists of uuids. Why not do this transformation once, when receiving a tablet?

I tinkered with this a bit and with current design translating when receiving the tablet won't save me from calling metadata.getHost(uuid) here anyway. I do that in TabletMap.getReplicas() to also confirm that the host can be seen by the driver.

The only place it is used, the surrounding code gets the Host instances from metadata again to translate from UUID to Host, so I've added a commit changing the return type to save on that translation.

@Bouncheck
Copy link
Collaborator Author

The switch to collections preserving ordering probably should be done not just for TabletMap and seems a bit bigger in scope so I'll leave that for another PR as suggested.

Makes TabletMap return empty collection for some calls to `getReplicas()`.
If the current replica list for a tablet turns out to contain a host
that is not present in current Metadata then empty collection is returned
in an effort to misroute the query. This query will cause the tablet
information to be updated when the result with the up to date information
is received.
It is possible that the query will be randomly routed correctly, but this
case is significantly less likely with each subsequent query, although
this can depend on underlying load balancing policy.

This covers the node removal/replacement case of scylladb#378.
@Bouncheck Bouncheck force-pushed the scylla-3.x-remove-tablets-w-stale-hosts branch from 493b19d to f05ba86 Compare November 28, 2024 15:52
Having this method return set of hosts instead saves one `Map#get()` query
per host.
In theory, previously it was also possible that between `TabletMap#getReplicas()`
returning and `Metadata#getReplicas()` translating uuids to hosts, some hosts could
become invalid, making the translation represent them as nulls.
@Bouncheck Bouncheck merged commit 7eb594d into scylladb:scylla-3.x Nov 28, 2024
10 of 11 checks passed
@Bouncheck Bouncheck changed the title Make TabletMap return empty replica set when stale replica is found 3.x: Make TabletMap return empty replica set when stale replica is found Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants